Combined Gesture-Speech Analysis and Synthesis

نویسندگان

  • Mehmet Emre Sargin
  • Ferda Ofli
  • Yelena Yasinnik
  • Oya Aran
  • Alexey Karpov
  • Stephen Wilson
  • Yucel Yemez
  • Engin Erzin
  • Murat Tekalp
چکیده

Multi-modal speech and speaker modelling and recognition are widely accepted as vital aspects of state of the art human-machine interaction systems. While correlations between speech and lip motion as well as speech and facial expressions are widely studied, relatively little work has been done to investigate the correlations between speech and gesture. Detection and modelling of head, hand and arm gestures of a speaker have been studied extensively in [3]-[6] and these gestures were shown to carry linguistic information [7],[8]. A typical example is the head gesture while saying ”yes”. In this project, correlation between gestures and speech is investigated. Speech features are selected as Mel Frequency Cepstrum Coefficients (MFCC). Gesture features are composed of positions of hand, elbow and global motion parameters calculated across the head region. In this sense, prior to the detection of gestures, discrete symbol sets for gesture is determined manually and for each symbol, based on the calculated features, model is generated. Using these models for symbol sets, sequence of gesture features is clustered and probable gestures is detected. The correlation between gestures and speech is modelled by examining the cooccurring speech and gesture patterns. This correlation is used to fuse gesture and speech modalities for edutainment applications (i.e. video games, 3-D animations) where natural gestures of talking avatars is animated from speech.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multimodal analysis of speech and arm motion for prosody-driven synthesis of beat gestures

We propose a framework for joint analysis of speech prosody and arm motion towards automatic synthesis and realistic animation of beat gestures from speech prosody and rhythm. In the analysis stage, we first segment motion capture data and speech audio into gesture phrases and prosodic units via temporal clustering, and assign a class label to each resulting gesture phrase and prosodic unit. We...

متن کامل

CSLDS: Chinese Sign Language Dialog System

This paper presents a Chinese sign language dialog system (CSLDS) based on the technique of large vocabulary continuous Chinese sign language recognition (CSLR) and Chinese sign language synthesis (CSLS). This system can show the advance technology on gesture recognition and synthesis well and can apply to more powerful system combined with speech recognition and synthesis technology, which the...

متن کامل

Text to Avatar in Multi-modal Human Computer Interface

In this paper, we present a new text-driven avatar system, which consists of three major components, a text-to-speech (TTS) unit, a speech driven facial animation (SDFA) unit and a text-to-sign language (TTSL) unit. A new visual prosody time control model and an integrated learning framework are proposed to realize synchronization among speech synthesis, face animation and gesture animation, wh...

متن کامل

The Role of Synchrony and Ambiguity in Speech-Gesture Integration during Comprehension

During face-to-face communication, one does not only hear speech but also see a speaker's communicative hand movements. It has been shown that such hand gestures play an important role in communication where the two modalities influence each other's interpretation. A gesture typically temporally overlaps with coexpressive speech, but the gesture is often initiated before (but not after) the coe...

متن کامل

Infants temporally coordinate gesture-speech combinations before they produce their first words

This study explores the patterns of gesture and speech combinations from the babbling period to the one-word stage and the temporal alignment between the two modalities. The communicative acts of four Catalan children at 0;11, 1;1, 1;3, 1;5, and 1;7 were gesturally and acoustically analyzed. Results from the analysis of a total of 4,507 communicative acts extracted from approximately 24 h of at...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005